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Abstract 

We analyzed the non-Gaussian nature of network 
traffic using some Internet traffic data. We found 
that (1) the non-Gaussian nature degrades net- 
work performance, (2) it is caused by 'greedy 
flows' that exist with non-negligible probability, 
and (3) a large majority of 'greedy flows' are 
TCP flows having relatively small hop counts, 
which correspond to small round-trip times. We 
conclude that in a network that has greedy flows 
with non-negligible probability, a traffic control- 
ling scheme or bandwidth design that considers 
non-Gaussian nature is essential. 



1 Introduction 

Traffic characterization based on measurements 
is crucial for establishing high-quality perfor- 
mance evaluation and efficient network provi- 
sioning. It has been widely elucidated that in to- 
day's high-speed data networks, self-similarity is 
appropriate for traffic characterization and per- 
formance evaluation since pioneering work con- 
ducted by researchers from Bellcore in the early 
1990s ^. Self-similarity in data network traf- 
fic suggests that traffic variability has 'long-range 
dependence (LRD)', while the classic Poisson 
traffic model is based on the principle that traf- 
fic variability has short-range dependence (ex- 
ponential). In a series of self-similarity related 
studies, it was found that if traffic variability 
has a higher degree of LRD, then network per- 
formance tends to be worse than when it has a 
lower one ^. 



Self-similarity in data networks has been stud- 
ied in terms of its ubiquitous presence and useful- 
ness from various measurements and statistical 
analyses and simulation studies. However, these 
are not sufficient from the viewpoint of evaluat- 
ing network performance. That is, in some cases 
network traffic with a higher degree of LRD could 
show better performance than that with a lower 
one — the reverse of above findings. This is be- 
cause LRD reflects only the temporal structure 
of traffic variability and not its spatial structure, 
such as marginal distribution. Grossglauser et 
al. showed, using a fluid traffic model Q, that in 
addition to LRD, the difference in marginal dis- 
tributions strongly affect network performance. 
However, detailed information about marginal 
distributions of real network traffic and the way 
to describe them in their fluid traffic model was 
not given in the study. In general, if traffic is 
aggregated from a number of independent and 
identically distributed fiows, the marginal dis- 
tribution of its variability is considered to be 
Gaussian and this nature is guaranteed by the 
central limit theorem. Actually, in recent traffic 
models that reflect traffic LRD such as the well 
known fractional Brownian motion traffic model 
proposed by Norros or the one proposed by 
Willinger and Taqqu [^], which is a superposi- 
tion of a large number of independent ON/OFF 
sources with heavy-tailed ON and/or OFF pe- 
riods, their marginal distribution of traffic vari- 
ability is Gaussian. 

In this paper, we investigate marginal distri- 
butions of network traffic using some Internet 
traffic data. We found that marginal distribu- 
tions of traffic variability are not always Gaus- 
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sian (i.e., they are non-Gaussian) and in many 
cases, they are skewed positively. We also show- 
that the non-Gaussian nature has a strong influ- 
ence on network performance. This means that 
it is essential to consider the non-Gaussian na- 
ture of network traffic in order to characterize 
traffic variability. Thus, we focus on the mecha- 
nisms that cause non-Gaussian nature of traffic 
variability. To study this, we analyze the behav- 
ior of each IP flow composing aggregated traf- 
fic from the viewpoints of size distribution, hop 
counts, RTT, and protocol. 

This paper is organized as follows. Section 
2 shows examples of the non-Gaussian nature 
of network traffic and its implication on net- 
work performance. In section 3, to identify the 
mechanisms causing non-Gaussian nature, we 
define 'per-time-unit flow' and analyze its statis- 
tics such as size distribution, hop counts, RTT, 
and protocol. Section 4 gives our conclusions. 

2 Examples of Non- Gaussian 
Nature and performance im- 
plication 

This section shows examples of the non-Gaussian 
nature of network traffic using real Internet traf- 
fic traces and then shows its performance impli- 
cations by trace driven simulation. 

2.1 Data 

We used traces from three different sets of net- 
work traffic for our analyses. In this work, the 
length of all traces was set to 300 s to avoid 
the effect of non-stationarity and also to get 
enough statistics. Actually, each trace had at 
least 100,000 packets in this condition. Details 
about the traces are briefly summarized as fol- 
lows. 

Data I: ECL external line 

This line is the main external connection line of 
NTT R&D center (ECL). It is a 12-Mbps ATM 
line and traces were captured at the segment 
one hop before the line. The measurements were 



made during daily busy hours on some weekdays 
in July 2001. In total we used 48 traces for this 
study. 

Data II: OCN-SINET 

This line connects NTT's Open Computer Net- 
work (OCN) and the Science Information Net- 
work (SINET). OCN is NTT's commercial Inter- 
net backbone network and SINET is the largest 
Internet backbone network for scientific research 
institutes in Japan. The link is a 135-Mbps ATM 
line. The measurements were made during daily 
busy hours on some weekdays in January 2000. 
In total we used 34 traces for this study. More 
detailed information about this data is available 
in@. 

Data III: Bellcore 

The lines are several Ethernet networks at the 
Bellcore Morristown Research and Engineering 
facility. Traces are available from the Internet 
Traffic Archive ||T^. In this study, we used the 
first 300 s of BC-pAug89.TL. Detailed informa- 
tion about the traces is shown in Q, where the 
self-similarity of Ethernet traffic was first demon- 
strated using data that included this data set. 

2.2 Traffic variability and marginal 
distributions 

For all traces described in 2.1, we calculated the 
throughput variability and their marginal distri- 
butions. Here, we calculated throughput using a 
time interval of 0.1 s. Figure ||, shows through- 
put variability (left side) and marginal distri- 
butions (right side) for randomly chosen traces 
from the three networks described in 2.1. From 
this figure, we can intuitively find that marginal 
distributions of all traces are asymmetric and 
skewed positively. To characterize the difference 
in marginal distributions quantitatively, we used 
skewness, which is defined as 



skewness 



{iX-{X)f) 



(1) 



where {X) is the mean of X and a is the standard 
deviation of X. If the distribution is skewed pos- 
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traffic variability. 



Table 1: Skewness of three example traces. 





Data I 


Data II 


Data III 


skewness 


0.812 


0.655 


1.320 



itively (negatively), skewness is positive (nega- 
tive). If the distribution is exactly Gaussian, the 
shape of the marginal distribution is symmetric 
and the skewness is 0. Table 1 shows skewness of 
three example traces given in Fig. |^. All values 
of skewness took positive values. In these three 
example traces, we can see that traffic variabil- 
ity of Data III has the strongest non-Gaussian 
nature. 

2.3 Non-Gaussian nature and network 
performance 

For Data I and II, we performed trace-driven 
simulation to see the effect of non-Gaussian 
nature on network performance in conditions 
having similar bandwidth and buffer capacity. 
We used Internet-to-ECL traffic for Data I and 



OCN-to-SINET traffic for Data II because these 
directions have more traffic than the opposite 
directions. We did not use Data III because 
we could not clarify the direction of traffic from 
given traces. 

We also show the relationship between the 
Hurst parameter and network performance for 
comparison. The Hurst parameter, H, is be- 
tween 0.5 and 1, where a large value means a 
high degree of LRD |^,||9|. To estimate H, we 
employed power spectrum density estimation, re- 
moving the linear trend from the throughput 
time series before employing a Fourier transform 
(Fig. ^). From the slope a of the log- log re- 
gression of the power spectrum density versus 
the frequency, we get H = |ll,@. Here, 
the lowest 10% of frequency was used for regres- 
sion. Table ^ shows estimated Hurst parameters 
of three example traces given in Fig. |. All val- 
ues are larger than 0.5, which indicates that all 
traces have LRD. 

Figure |3| shows our simulation model. As the 
network simulator, we used ns-2 [|l^ ]. The one- 
way trace data was set to 'Trace source' and 
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Table 2: Estimated Hurst parameters of three 
example traces. 





Data I 


Data II 


Data III 


H 


0.714 


0.894 


0.858 




aggregation link 
bandwidth = Bw 



Figure 3: Simulation environment. 

packets were sent from it to 'Trace sink' via two 
nodes following the timestamp and packet size 
recorded in the trace data. In this study, there 
was no host-side traffic control scheme such as 
TCP; all packets were treated like UDP packets. 
This is because we wanted to see how packets 
would be discarded given a constrained band- 
width for originally demanded (non-shaped) traf- 
fic. Here, we set the buffer size of the 'ag- 
gregation link' to 50 packets to see the differ- 
ence in performance clearly, and used FIFO as 
the packet scheduling. The link bandwidth was 
changed so that the link utilization of each trace 
was 0.6. For example, if the average traffic vari- 
ability (throughput) of a trace was 6 Mbps, we 
set the bandwidth to 10 Mbps. 

For each trace, we calculated its skewness and 
Hurst parameter from its throughput variabil- 
ity, and performed the trace-driven simulation. 
Figure ^ shows the results where each point cor- 
responds to the result of one trace. From the 
figure, we can see that both Hurst parameter 
and skewness took a wide range of values for two 
networks. That is, most of traffic had LRD and 
was positively skewed (non-Gaussian). It also 



Table 3: Correlation coefficients of Hurst param- 
eter and skewness vs. packet loss ratio. 





Data I Data II 


Hurst parameter 


-0.015 0.169 


skewness 


0.584 0.435 
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Figure 4: Results of trace-driven simulation. 
Left side shows Hurst parameter vs. packet loss 
ratio; right side shows skewness vs. packet loss 
ratio. Top is the result using Data I and bottom 
is Data II. 



shows that as skewness increased, network per- 
formance degraded as well. That is, as the non- 
Gaussian nature became stronger, the network 
performance became more degraded. To see 
the relationship between network performance 
and the above two characteristics (LRD, non- 
Gaussian) quantitatively, we calculated correla- 
tion coefficients between them (Table In 
Fig. ^, solid lines indicate linear regression. The 
table shows that skewness was positively corre- 
lated with network performance while the Hurst 
parameter had little correlation with it. These 
results imply that the non-Gaussian nature of 
traffic strongly affects the network performance. 

3 Mechanism of Non-Gaussian 
Nature 

This section investigates the mechanism of the 
non-Gaussian nature of network traffic. For this, 
we first introduce 'per-time-unit fiow' — an IP 
fiow defined in a given time unit — to analyze the 
behavior of IP flows composing the aggregated 
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traffic. Here IP flow is a group of packets hav- 
ing a unique combination of source IP address, 
destination IP address, source port, destination 
port, and protocol as is defined in |jl^, ^. Then 
we show that 'greedy flows' strongly affect the 
non-Gaussian nature of network traffic. After 
that, we show the nature of greedy flows from the 
viewpoints of hop counts distributions. Investi- 
gating the relationship between traffic variability 
and hop count distributions will also enable us 
to establish efficient network bandwidth design 
according to its topology. To investigate the na- 
ture of greedy flows in detail, we also studied 
RTT and protocol distributions of per-time-unit 
flows. 

3.1 Definition of per-time-unit flow 
and greedy flows 

Many recent ON/OFF source traffic models (also 
known as packet train models) such as |]l^ and 
1^] assume that traffic is aggregated from a num- 
ber of flows having a uniform rate. Accordingly, 
each flow has a similar size on a certain time scale 
when the flow is in the ON-period. It should 
also be pointed out that in these models, aggre- 
gated traffic shows the Gaussian nature accord- 
ing to the central limit theorem, as mentioned 
in section 1. However, it is not clear that the 
above "uniform rate assumption" is appropriate 
for modern Internet traffic. So we introduce 'per- 
time-unit flow' to investigate how traffic is ag- 
gregated on a certain time scale, and to see how 
the behavior of each flow contributes to the non- 
Gaussian nature of aggregated traffic. 

In Fig. ^, each square corresponds to one IP 
flow. We divided traces into time unit Tj, where 
1 < i < M. For all i, the length of Tj was set 
to time interval r. For each Tj, we define per- 
time-unit flow fLj (Ti) as shown by the shaded 
regions in the figure, where 1 < j < Nt^ and 
is the number of fiows during Tj. The per- 
time-unit fiow fl_j (Ti) should contain at least 
two packets during Tj . In this work the length of 
r was set to 0.1 s. 

For each per-time-unit fiow fl_j (Ti), we 
counted the number of packets Np (fLj (Ti)). In 
this study, we defined a 'greedy fiow' as one 
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Figure 5: Definition of per-time-unit fiow. 



whose Np [fl-j (Ti)) is larger than 20, which cor- 
responds to throughput of about 1 Mbps assum- 
ing the average packet size to be 700 bytes. 

3.2 Size distribution of per-time-unit 
flow 

For Data I and II, we investigated the size distri- 
bution of Np {fLj (Ti)). We calculated the fol- 
lowing complementary cumulative distribution 
for all i,j. 

P[NpifLjiT,))>np] (2) 

Figure ^ shows the log-log complemen- 
tary cumulative distribution (LLCD) plots of 
Np{fl_j(Ti)) for all i,j. As we can see imme- 
diately, the figure shows that distributions of 
Np {fl-j {Ti)) are in good agreement with the 
power-law; that is, 

P [Np {fLj {Ti)) > Up] ~ n~°, as Up oo. 

(3) 

Estimated power exponents a of equation (3) 
for traces given in Fig. |l| are 2.96 for Data I 
and 1.83 for Data II, where the regression range 
was set to Up > 10. Correlation coefficients 
for regressions are -0.99 for both Data I and II. 
The power-law of distributions of Np {fLj {Ti)) 
indicates that both traces described in Fig. ^ 
had greedy flows with non-negligible probability 
(right side of the dashed line in the figure. It 
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Figure 6: LLCD plots of Np {fl-j (Tj)) for two Figure 7: Exponents a vs. skewness. Left side is 



traces given in Fig. ||. 



Data I and right side is Data II. 



should also be pointed out that as a approaches 
2, the distribution of Np {fl-j (Ti)) approaches 
a heavy-tailed Q distribution, which indicates 
that very large values existed with non-negligible 
probability. 



3.3 Greedy flows and non-Gaussian 
nature 

In order to see the relationship between greedy 
flows and the non-Gaussian nature of network 
traffic, we investigated the relationship between 
estimated power exponents a of equation (3) and 
the skewness of throughput variability for each 
trace. In Fig. 0, the estimated power exponents 
a are plotted against skewness for all traces of 
Data I and II. The figure shows that in both 
cases, skewness increased as a decreased, and 
this tendency was stronger when a was close to 2 
(inside ellipses and dashed lines). These results 
lead to the conclusion that greedy flows existing 
with non-negligible probability contribute to the 
non-Gaussian nature of network traffic, because 
the decrease in a corresponds to an increase in 
the probability of a greedy fiow existing and the 
increase in skewness corresponds to an increase 
in the degree of non-Gaussian nature of network 
traffic. 



^The distribution of X is heavy-tailed if 

P[X > x] x'"', as X ^ oo, 0<a<2 




Figure 8: Hop count estimation between two 
hosts. 

3.4 Hop count distribution of per- 
time-unit flow 

For each per-time-unit flow fl-j (Ti), we investi- 
gated hop counts hop {fl-j {Ti)). We used only 
Data I because traces of Data II contain only 
one-way traffic and we could not estimate hop 
counts of each flow with our method described 
below. To study hop counts between two nodes 
from the given trace data, we used the TTL (time 
to live) field of an IP packet. As its value is de- 
creased when an IP packet passes a router, we 
can estimate hop counts between the source node 
and measuring point from the initial TTL value 
and the TTL value of the received packet. So, 
if we can obtain the hop counts from both the 
source and destination nodes to the measuring 
point, we can estimate the hop counts between 
these nodes. We show an example below. 

In Fig. ^, client C and server S compose an 
IP flow. Let initial TTL values of each host 
to be 128 and 64. If we obtain TTL values as 
125 and 62 at the measuring point, hop counts 
between these two nodes can be estimated as 
(128 - 125) + (64 - 62) -h 1 = 6, where we as- 
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(above the dashed hne) can be considered to be 



smaller than those of all flows. Figure IC shows a 



Figure 9: hop {fl.j {Ti)) vs. Np {fLj {Ti)) for the 
trace of Data I given in Fig. ||. 



sume that the route between two nodes does not 
change during a round-trip. One difficulty with 
this approach is that the initial TTL values de- 
pend on the operating system or network equip- 
ment such as routers (see [^] for example). To 
overcome this difficulty, we employed the tech- 
nique introduced by Fujii et al. in |^; that is, we 
assumed initial TTL values to be 32, 64, 128, and 
255, and ignored other initial TTL values such as 
30 or 60 because systems that create such non- 
2" related initial TTL values can be considered 
to be rather out-of-date and unusual in today's 
network. From measured TTL values, we choose 
the closer (and larger) value from the above four 
values, and used it as its initial TTL value. For 
example, if we receive a packet with initial TTL 
value of 45, we assume the initial TTL value of 
the packet to be 64. 

We investigated hop counts hop {fLj {Ti)) of 
each per-time-unit flow fLj {Ti) for all traces of 
Data I, using the hop count estimation method- 
ology described above. Here, we removed flows 
having multiple hop counts for one source IP 
addres^. Figure |9| shows the relationship be- 
tween hop {fLj {Ti)) and Np{fLj {T)) for the 
trace given in Fig ||. Hop counts of greedy flows 



histogram of hop {fLj {Ti)) for (a) all per-time- 
unit flows and (b) for greedy per-time-unit flows, 
where we used all traces of Data L Average hop 
counts for greedy flows were smaller than those 
of all flows, and most greedy flows had relatively 
smaller hop counts. Actually, average hop counts 
were 20.84 for all flows and 18.22 for greedy flows 
(see dashed lines in Fig. |l0[) 0. 

From these results, we conclude that greedy 
flows had smaller hop counts than those of all 
flows in our study. This is because the RTTs of 
flows with smaller hop counts are assumed to be 
smaller statistically, as demonstrated in ||2|, and 
TCP flows with smaller RTTs can quickly make 
their window size larger (i.e., can become greedy) 
following the mechanism of TCP flow control. 
Accordingly we can assume that if TCP flows 
having smaller hop counts (i.e., smaller RTTs) 
exist with non-negligible probability, then the 
aggregated traffic shows a non-Gaussian nature. 
To verify the above assumptions, we examine the 
relationship between hop counts and RTT, and 
protocol distribution for our trace data in the fol- 
lowing two sections. Our goal is to verify the lin- 
ear relationship between hop counts and RTTs, 
and protocol breakdown for trace data used in 
this study. 



3.5 Relationship between hop counts 
and RTTs 

Here, we introduce the technique for estimat- 
ing RTTs from given passively measured traces 
and then give results of analyses for our trace 
data. Our approach is based on the technique 
proposed in M, which is to analyze TCP's 3-way 



■^This is caused by a change in routing or intentional 
change in initial TTL value given by some special appli- 
cations such as traceroute. 



^ In Fig. |lo|(b), the peak at hop count of 18 is due 
mainly to one long-lived greedy IP flow, which caused a 
large number of greedy per-time-unit flows. Actually, the 
number of greedy per-time-unit flows coming from this 
IP flow was 7025, while total number of greedy per-time- 
unit flows was 33817. When we removed this IP flow, the 
average hop counts for greedy flows became 18.28, which 
indicates that the existence of the IP flow did not affect 
the result. 
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(b) 
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Figure 10: Histogram of hop {fLj (Ti)) for all 
traces of Data I. (a) is for all per-time-unit flows 
and (b) is for greedy ones. The dashed lines in- 
dicate average hop counts. 
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Figure 11: Diagram of TCP's 3- way handshake. 



handshake [] packets. Adding to the description 
given in we give a more detailed description 
of the technique. Figure 11 diagrams this be- 



tween hosts a and j3. For convenience, we call a 
TCP packet with a SYN (SYN and ACK, ACK) 
flag bit on a SYN (SYN+ACK, ACK) packet. 
In our study, we measured traffic at measuring 
point M between the two hosts. 

First host a sends a SYN packet in order to 
request connection establishment. Let the time 
at this moment be ta (S). The SYN packet 
passes the measuring point at tM (S), and is re- 
ceived by host (5 at Immediately upon 
receiving the SYN packet, host /3 sends back a 
SYN -|- ACK packet at tjs (SA), and it is received 
by host a at ta (SA). Similarly, host a immedi- 
ately sends back an ACK packet at ta (A), and 
it passes measuring point M at tM (A), and it 
is received by host /3 at (A) for the end of 
negotiation. Assuming that the delay caused 
by transactions of each host is quite small (i.e., 
tf3 (S) ~ t(3 (SA) , ta (SA) ~ ta (A)) and there is 
no queueing delay caused by network congestion, 
RTT between host a and (3 can be estimated as 
tM (A) — tM (S) as described in the figure 0. 



"^Basic 3-way handshake for connection synchroniza- 
tion is defined in RFC 793 §. 

^Of course this assumption is not always appropriate; 
that is, conditions of host and network are always chang- 
ing and RTTs for the same flows also fluctuate. So, to 
avoid the efl'ect of fluctuations we used the average value 
of RTTs for statistical study. 



Using the above approach, we estimated RTTs 
of IP flows that contain TCP's 3-way handshake, 
where we removed flows that contain duplicated 
SYN and SYN-^ACK to estimate RTTs exactly. 
In the trace of Data I given in Fig. |I|, the total 
number of IP flows in the Internet-to-ECL direc- 
tion was 16,358. We could estimate RTTs for 
6834 of them and estimate both RTTs and hop 
counts for 1987 of them. 

Figure ^ shows (a) the relationship between 
hop counts and number of IP flows and (b) the 
relationship between hop counts and the aver- 
age of estimated RTTs, where we collected esti- 
mated RTTs per hop count and used their av- 
erage as a representative value. The correlation 
coefficient of the average of the estimated RTTs 
and hop counts was 0.93, where the regression 
range of hop counts h was set to 14 < /i < 30, 
where the number of IP flows exceeds 1% (in- 
side dashed lines). The result indicates that hop 
counts and RTTs are in good agreement with 
a linear correlation. Thus, it was verified that 
statistically, IP flows with smaller (larger) hop 
counts have smaller (larger) RTTs for trace data 
used in this study. This suggests that if the IP 
flow has smaller hop counts and its protocol is 
TCP, it can be greedier, following the mecha- 
nisms of TCP as mentioned in the previous para- 
graph. In the next section, we show the protocol 
distribution for trace data. 
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Figure 12: (a) Hop counts vs. number of IP 
flows and (b) hop counts vs. average of estimated 
RTTs. 



3.6 Protocol distribution 



For each per-time-unit flow fl_j (Tj), we investi- 
gated the protocol. Table |^ shows the protocol 
distribution of all per-time-unit flows and greedy 
per-time-unit flows for Data I and II. The results 
show that for both cases, most of their protocols 
were TCP. That is, most per-time-unit flows fol- 
lowed TCP's flow control mechanism, and if flows 
had smaller RTTs (i.e., smaller hop counts as 
shown in previous section), they could quickly 
make their window size larger and be greedy in 
a short time, leading to the non-Gaussian nature 
of aggregated network traffic. As today's most 
popular Internet applications such as WWW are 
based on TCP (HTTP), this implication is very 
important. 



4 Conclusion 

In this work, we showed the performance implica- 
tion of the non-Gaussian nature of network traf- 
fic and analyzed its mechanisms using some In- 
ternet traffic data. Our main findings are that 
(1) the non-Gaussian nature degrades network 
performance, (2) it is caused by greedy exist- 
ing flows with non-negligible probability, and (3) 
a large majority of greedy flows are TCP flows 
having relatively small hop counts, which cor- 
respond to small RTTs. Accordingly, we con- 
clude that in a network that has greedy flows 
with non-negligible probability, a traffic control- 
ling scheme or bandwidth design that considers 
non-Gaussian nature is essential. 

We expect that detecting non-Gaussian factors 
will allow us to propose practical methodologies 
for traffic engineering. We show some examples 
below. As we have shown in section 3, the be- 
havior of each IP flow is related to its hop count; 
that is, IP flows with smaller hop counts tends 
to be greedier than ones with larger hop counts. 
So, classifying IP flows with their hop counts 
(e.g., using TTL fields) at routers will be use- 
ful for traffic engineering. For instance, decreas- 
ing queueing priority for IP flows with smaller 
hop counts will decrease the number of possi- 
ble greedy flows. The decrease in the number 
of greedy flows will let the nature of aggregated 
traffic to be close to Gaussian, where network 
performance will be improved for a given utiliza- 
tion and buffer capacity, as we showed in Fig. ^. 
It will also lead to the establishment of fairness 
among IP flows. Another example is to clarify 
the relationship between traffic characterization 
and network topology as mentioned in section 3. 
Efficient network design according to its topology 
will be established by this study. That is, from 
the hop count distribution obtained by analyzing 
the network topology, we can estimate whether 
the network is likely to have greedy flows (i.e., 
IP flows having smaller hop counts). If it does 
with non-negligible probability, then the aggre- 
gated traffic will show non-Gaussian nature and 
bandwidth design considering the effect of non- 
Gaussian nature will be effective for efficient op- 
eration of the network. 
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Table 4: Protocol distribution for all per-time- 
unit flows and greedy ones. 



[4] Postel, J., Editor, 
protocol", STD 7, 
1981. 



"Transmission control 
RFC 793, September 





Data I 


Data II 




All 


Greedy 


All 


Greedy 


TCP 


97.00 % 


99.90 % 


95.76 % 


95.36 % 


UDP 


1.02 % 


0.10 % 


4.20 % 


4.62 % 


Other 


1.98 % 


0.00 % 


0.04 % 


0.01 % 



We consider that the existence of greedy flows 
is due to the heterogeneity of the Internet. As 
shown in section 3.2, flows were aggregated in 
various manners on a certain time scale. Actu- 
ally, they were following a power-law (Fig. ^). 
We expect that this power-law mainly comes 
from (a) the heterogeneity of network topology, 
which is partly found in hop count distributions 
or (b) the heterogeneity of user links, which 
ranges from low-speed links such as analog mo- 
dem to high-speed links such as gigabit Ether- 
net. So, in the Internet, there exist various kinds 
of IP flows and this diversity leads to greedy 
flows existing with non-negligible probability. As 



pointed out in |1C], learning the characteristics 
of the Internet is an immensely challenging un- 
dertaking because of the network's great hetero- 
geneity and rapid changes. However, we believe 
that seeking some invariant characteristics in the 
Internet such as self-similarity or non-Gaussian 
nature or power-law of per-time-unit flows will 
help us to build practical models of it and pro- 
pose methodologies for operating it efficiently. 
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